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Abstract- Collaborative knowledge base (KB) authoring environments are critical lor the 
construction of hiuh-performance KBs. Such environments must support rapid construction 
of KBs bv a collaborative effort of teams of knowledge engineers through reuse o! existing 
knowledge and software components. They should support the manipulation ol knowledge 
bv diverse problem-solving engines even if that know ledge is encoded m diiierent 
lanoua.es and by different researchers. They should support large KBs and prov.de a 
scalable and interoperable development infrastructure. In this paper, we present an 
environment that satisfies many of these goals. 



V. K. Chaudhri. A. Farquhar. R. Fikes. P. D. Karp. and .1. P. Rice lOj^C . ^ ! « 

5 ,>r Knowledge B:„e lnU io;vnb-:;. ' in Proceeding of the AAAI-VX. ( Madison. v\ „. . wS. 

Abstract: The technoloev for building large knowledge bases (KBs) is yet to witness a 
breakthrough so that a KB can be constructed by the assembly of prefabricated know ledge 
components. Most of the current KB development tools can only manipulate know ledge 
residing in the knowledge representation system (ICRS) for which the tools were originally 
developed. Open Knowledge Base Connectivity (OKBC) is an application programming 
interface for accessinu KRSs. and was developed to enable the construction of reusable KB 
tools OKBC improves upon its predecessor, the Generic Frame Protocol (GIF), m several 
significant wavs. In this paper, we discuss technical design issues laced m the development 
of OKBC. highlight how OKBC improves upon C.FP. and report on practical experiences m 
using it. 

P Karp M. Rilev. S. Palev. A. Pellcgrini-Toole. and M Krummenacker. li^^cj.onic 
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Abstract: The Fncvclopedia of F. coli Genes and Metabolism (FcoCyc) is a database that 
combines information about the genome and the intermediary metabolism ot P.. co h. It 
describes -970 u ,-nes ol F. coli. 547 en/ymes encoded by these genes. 70 metabolic 
reactions that oe'eur m 1:. coli. and the organization ol these reactions into 107 metabolic 
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EcoCyc database using visualization tools such as genomic-map browsers and automatic 
knouts of metabolic pathw ays. EcoCyc spans the space from sequence to function to allow 
scientists to investigate an unusually broad range of questions. EcoCyc can be thought oi as 
both an electronic review article because of its copious references to the priman literature, 
and as an in silieio model of E. coli metabolism that can be probed and analyzed through 
computational means. 

V. K. Chaudhri and P. D. Karp. "Ouerxinii Schema Information." Proceedings ot the 4th Inter inn tonal 
Workshop Knowledge Representation Meets Databases (KRI)B r( rn pp. 4-1 to 4-6. 1W. 

Abstract: Schema queries can play an important role while retrieving information from 
multiple sources, for example, in query formulation and in query optimization. We identth 
four classes of schema queries that we hav e found useful while designing an application 
programming interface for frame representation systems (FRSs): taxonomic. trame 
structure, constraint and class comparison queries. We propose a scheme for direct support 
for these queries in a mediator language such as Object Query Language (OQU. 
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Abstract: The bioinformatics community is becoming increasingly reliant on the creation ot 
links among biological databases (DBs) as a foundation for DB interoperability, f or 
example, a link might be created from a protein in one DB (such as PIR). to a gene in 
another DB (such as GDBh by storing the unique identifier (id) of the gene object w ithin an 
attribute of the protein object. User interfaces can then support navigation from the protein 
to the gene, and multiDB queries can join the protein with the gene. The unique id of the 
uene is serving as a foreign key. However, a variety of factors, such as changes in the 
underlying biology, can cause object ids to become invalid, thus producing invalid links 
among DBs. Invalid links are a violation of multidatabase referential integrity. We propose 
a network protocol whereby a database administrator can prov ide information about changes 
to the identifiers of objects in their database via Internet, to allow other databases to 
maintain referential integrity. We request comments from the bioinformatics community for 
the purpose of building a consensus on the proposed protocol. 

P. Karp. M. Riley. S. Paley. and A. Pellegrini-Toole. TcoClyc: Idec^ 

and metabolism/ . V;/c Acids Res., vol. 24. no. 1. pp. 32-40. 1996. 

Abstract: The Encyclopedia of E. coli Genes and Metabolism (KcoCyc) is a database that 
combines information about the genome and the intermediary metabolism of E. coli. It 
describes 2034 genes of E. coli. 306 enzymes encoded by these genes. 580 metabolic 
reactions that occur in E. coli. and the organization of these reactions into 100 metabolic 
pathways. The EcoCyc graphical user interface allows scientists to query and explore the 
EcoCyc database using visualization tools such as genomic-map browsers and automatic 
lavouts of metabolic pathw ays. EcoCyc spans the space from sequence to function to allow 
scientists to investigate an unusually broad range of questions. EcoCyc can be thought ot as 
both an electronic review article because of its copious references to the primary literature, 
and as an in silico model of E. coli that can be probed and analyzed through computational 
means. 
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Intelligent Systems for Moleeitlar Biology. (Menlo Park. CA). AAA! Press. 1996. 

Abstract: We present a methodology for predicting the metabolic pathways of an organism 
from its genomic sequence by reference to a know ledge base of know n metabolic pathways. 
We applied these techniques to the genome of II. influenzae by reference to the EcoCyc 
knowledge base to predict which of XI metabolic pathways off", eoli are found in 1 1, 
influenzae. The resulting prediction is a complex hypothesis that is presented in computer 
form as HinCyc: an electronic encyclopedia of the genes and metabolic pathways of II. 
influenzae. HinCyc connects the predicted genes, enzymes, enzyme-catalyzed reactions, and 
biochemical pathways in a \V WAV-accessible knowledge base to allow scientists to explore 
this complex hypothesis. 

S. M. Paley and I\ I). Karp. ^Adariil^il ' Y 1 -iT- ; ■ ;k ; T: ■ :.\ ■" in 

Proceedings of the Association of Lisp Users Meeting and Workshop, pp. 1-9. 1995. 

Abstract: The World Wide Web (WWW) offers the potential to deliver specialized 
information to an audience of unprecedented size. Along w ith this exciting new opportunity, 
however, comes a challenge for software dev elopers: instead of rewriting our software 
applications to operate over the WWW. how can we maximize software reuse by retrofitting 
existing applications? We have developed a Web server tool, written in Common l.isp. that 
allow s any existing graphical user interface application w ritten using the Common I asp 
Interface Manager (CL1M) to hook easily into the WWW. This tool — ('WEST 
(CLIM-WEb Server Tool, pronounced "quest") — has been developed to operate with 
EcoCyc. an electronic encyclopedia of genes and metabolism of the bacterium I:, coli. 
EcoCyc consists of a database of objects relevant to H . coli biochemistry and a sophisticated 
interface, implemented in CL1M. that runs on the local host w indow system and generates 
graphical displays appropriate to each type of object. Each query to our server is passed as a 
command to the EeoCye program, which responds by dynamically generating an appropriate 
local drawing. That draw ing, which can be a mixture of text and graphics, is then translated 
into the HyperText Markup Language (HTML) and/or the Graphics Interchange Format 
(GIF) and returned to the client. Sensitive regions embedded in the CUM drawing are 
converted to hyperlinks with Universal Resource Locators (URLs) that generate further 
EcoCyc queries. This tight coupling of CLIM output with Web output makes CUM an ideal 
high-level programming tool for Web applications. The flexibility of Common Lisp and 
CUM made implementation of the server tool surprisingly easy, requiring few changes to 
the existing EcoCyc program. The results can be seen at URL 

http://wAvw.ai.sri.com/ecocyc/browser.html. We plan to make CWEST available to the 
CUM community at large, with the hope that it will spur other software developers to make 
their CUM applications available over the WWW. 

P. D. Karp. K. Myers, and T. Gruber. "J„; h c ..^.n o r i.e;.... n \ .m i.i c ]7.;yib..^:.s ^.L " in Proceedings of the FJ ( J5 
International Joint Conference on Artificial Intelligence, pp. 768-774. 1 W5. 

Abstract: The Generic f rame Protocol (GIT) is an application program interface for 
accessing knowledge bases stored in frame knowledge representation systems (M<Ss). GIT 
provides a uniform model of FRSs based on a common conceptualization of frames, slots, 
facets, and inheritance. GFP consists of a set of Common Lisp functions that provide a 
generic interface to underlying FRSs. This interface isolates an application from many of the 
idiosyncrasies of specific FRS software and enables the development of generic tools (e.g.. 
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graphical browsers, frame editors) that operate on many I : RSs. To date. (.I P has beer, used 
as an interface to Loom. Ontolingua. Theo. and Sipe. 
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573-586. 1995. 

Abstract: To realize the lull potential of biological databases (DBs) requires more than the 
interactive hvpertcxt flavor of database mtcroperntion that is now so popular m the 
biointormatics communilv. Interoperation based on declarative queries to multiple 
network-accessible databases will support analyses and investigations that are orders o, 
magnitude faster and more powerful than what can be accomplished through interactive 
navmation. I present a v ision of the capabilities that a query-based interoperation 
infrastructure should provide, and identify assumptions underlying, and requirements ot. 
this vision I then propose an architecture for query-based interoperation that includes a 
number of nov el components of an information infrastructure lor molecular biology. I hese 
components include: a knowledge base that describes relationships among the 
conceptualizations used in different biological databases: a module that can determine the 
DBs that are relevant to a particular query: a module that can translate a query and its results 
from one conceptualization to another: a collection of DB driv ers that prov ide uniiorm 
physical access to different database management systems: a suite ol translators that can 
interconvert amonu different database schema languages: and a database that describes the 
network location and access methods for biological databases. A number ot the components 
are translators that bridue the heterogeneities that exist between biological DBs at severa 
different lev els. including the conceptual lev el, the data model, the query language, and data 
formats. 

P D. Karp and S. M. Palev. :Knovyledg^^ in.!.je .large,': in Proceeding of, he 1995 

International. Joint Conference on Artificial Intelligence, pp. 7.M--7d8. l^rv 

Abstract: Frame knowledee representation systems lack two important capabilities that 
prevent them from scaling up to large application.,: they do not support last access to large 
knowledge bases (KBs). nor do thev provide concurrent multiuser access to shared KBs. \\ e 
describe the desmn and implementation of a storage subsystem that submerges a database 
management svsiem (DBMS) within a know ledge representation svstem. I he storage 
subsystem incrementally loads referenced frames from the DBMS, and can save onlv those 
frames that have been updated in a given session to the DBMS. We present experimental 
results that show our approach to be an improvement over the use ol flat hies, and that 
ev aluate several v ariations of our approach. 

Keywords: knowledge representation 

P. Karp and M. Mavrovouniotis. "^^m^^A^, "i^yi.!.^ 
IEEE Expert, vol. 9. no. 2. pp. 11-21. 1994. 

Abstract- I ivinu cells are complex systems whose growth and existence depends on 
thousands of biochemical reactions. A subset of these reactions - the metabolism - 
intercom erts small molecules. A v ariety of computational problems arise m representing 
knowledge of the metabolism in electronic form, in anah/ing that knowledge to gam dcepci 
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insights into complexities of the metabolism, and in using such knowledge m biology, 
biotechnolous and health appliealions. These problem., pros ide a rich set ol opportunities 
for exploiting existing AI techniques, and challenges for developing new and improved 
techniques. This article describes challenges and opportunities for addressing computational 
problems in the metabolism with techniques from knowledge representation, planning, 
integration of heterogeneous databases, qualitative reasoning, knowledge acquisition, and 
machine learning. The computational problems include construction ol large shared 
knowledge bases ot" biochemical pathways, knowledge acquisition from the biochemical 
literature^ qualitative simulation of metabolic pathways, thermodynamic estimation. 
sMilhesis of metabolic pathways, and scientific hypothesis formation. 

Annotation: This online version was edited heavilv to produce the \ersion published in 
If f II I- xpert. 

P Karp and S M Palev. "Automated draw ing oj mctaMi^^^ in Third International 

Conference on Bioinformatics and Genome Research (II. l.im. C. Cantor, and R. Robb.ns. eds.). 1904. 

Abstract: The LcoCvc svstem consists of a know ledge base that describes the genes and 
intermediary metabolism of L. eoli. and a graphical user interlace ((il l) for accessing thai 
knmvledue. This paper presents algorithms for drawing metabolic pathways by dynamicallx 
quervin" the underlvinn knowledge base. These algorithms pro\ ide a foundation lor 
buildinu araphical usennterfaces to metabolic databases. Pathway drawing is a graph-layout 
problem. Our algorithms draw pathways of several different topologies, including linear, 
cvclic. and branching pathways, as well as larger groupings of such pathways. 1 he 
af orithms provide several visual presentations of metabolic pathways, for example, 
confounds can be drawn as names and. or chemical structures, and enzyme names and side 
compounds can be drawn or omitted. The (it 'I also prov ides several facilities tor navigating 
in the space of biochemical pathways, such as traversing connections between pathways, 
and exploding or collapsing a pathway to include or exclude neighboring patln 
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P. D. Karp. J. D. Lowrance. T. M. Strat. and D. L. Wilkms. M hejjrasjie^^ 
systenL: LISP and Symbolic Computation, vol. 7. pp. 245-282. 1994. 

Abstract: Graphs are virtually ubiquitous in programming applications. Moreover, 
eraph-structured information is especially prevalent in A I applications. We can enhance 
programs that manipulate graph-structured information by providing these programs with 
oraphical user interfaces that draw graphs, and that allow users to interact with drawings ot 
Tn-aph nodes and edees. Grasper-CL is a Common Lisp system for manipulating and 
displavinu siraphs. Grasper-CL defines a graph abstract datatype and an extensive set ol 
associated operations for creating, modifying and interrogating graphs, and lor saving them 
persislentlv The svstem draws uraphs using (TIM (the Common Lisp Interlace Manager), 
and can create postscript renditions of its drawings. Grasper-CL supports a wide variety ot 
»raphic stvles for drawinu uraph nodes and edges. The sv stem includes sev eral dillerenl 
automatic uraph layout aluorithms. such as for circular and tree layout: it also supports full 
interactivcMiianipulation of graph drawings, finally, the svstem provides laeililies tor 
buildinu uraph-based user interfaces for application programs, which hav e been used in 
conjunction with the Sipe planner, the (lister ev idential reasoner. a scheduler lor the I lubble 
Space Telescope, and the LcoCvc encyclopedia of biochemical pathw ays. A number ol 
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uroups within the Alt' and SRI arc using the Grasper-CI. system in a \anct\ of projects. _ 
This talk will describe the system in detail lor people who wish to understand its capabilities 
heller or who are thinking of using it for other projects. This talk is also an opportunity for 
the audience to shape the future directions ofihe system: What additional capabilities 
should be added? Would users like more direct input in how the system evolves? Should we 
attempt to find funding for further development of the system and research on such issues as 
graph layout algorithms'.' 

Keywords: graphs 

P. 1). Karp. S. M. Paley. and I. Greenbcrg. "A Moiagc ,>y.jem ^ 
Proceedings ofihe Third International ( '(inference on Information and Knowledge Management <\\ 
Adam. ed.h 1994. 

Abstract: Twenty years of AI research in knowledge representation has produced frame 
know ledge representation systems ( l'RSs) thai incorporate a number of important advances. 
However. FRSs lack two important capabilities that prevent them from scaling up to 
realistic applications: they cannot provide high-speed access to large knowledge bases 
(KBs). and they do not support shared, concurrent KB access by multiple users. Our 
research investigates the hypothesis that one can employ an existing database management 
system ( DBMS) as a storage subsystem for an 1 RS. to provide high-speed access to large, 
shared KBs. We describe the design and implementation of a general storage s>stem that 
incrementallv loads referenced frames from a DBMS, and sa\es modified frames back to the 
DBMS, for two different l'RSs: LOOM and I ill O. We also present experimental results 
showing that the performance of our prototype storage subsystem exceeds that ol Hal tiles 
for simulated applications that reference or update up to one third ofihe frames from a large 
LOOM KB. 

Keywords: knowledge representation 

P. Karp and S. M. Paley. :Rcniyscn|ai!o :^ in Proceedings of die 

Second International Conference on Intelligent Systems for Molecular Biology (R. Altman. D Brutlag. 
P. Karp. R. Lathrop. and D. Searls. eds.). (Menlo Park. t_'A). AAAI Press. 1994. 

Abstract: The automatic generation of drawings of metabolic pathways is a challenging 
problem that depends intimately on exactly w hat information has been recorded for each 
pathway, and on how that information is encoded. The chief contributions of the paper are a 
minimized representation for biochemical pathways called the predecessor list, and 
inference procedures for converting the predecessor list into a pathway-graph representation 
that can serve as input to a pathway-drawing algorithm. The predecessor list has several 
advantages over the pathway graph, including its compactness and its lack of redundancy. 
The conversion between the two representations can be formulated as both a 
constraint-satisfaction problem and a logical inference problem, whose goal is to assign 
directions to reactions, and to determine which are the main chemical compounds in the 
reaction. We describe a set of production rules that solves this inference problem. We also 
present heuristics for inferring whether the exterior compounds that are substrates ol 
reactions at the periphery of a pathway are side or main compounds. These techniques were 
evaluated on 18 metabolic pathways from the LcoCyc knowledge base 

Kevwords: bioinformatics. metabolism 
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P. I). Karp. "I >cMgn nictliod^ lor scientific Jiypothe^is formation and their apnl ic;U s on to molecular 
i>Mmg\." Machine /.earning. vol. 12. pp. S^--l lo. 1W3. 

Abstract: Hypothesis-formation problems occur when the outcome of an experiment as 
predicted by a scientific theory does not match the outcome observed bv a scientist. The 
problem is to modify the theor\. and. or the scientist's conception of the initial conditions of 
the experiment, such that the prediction agrees with the observation. I treat hypothesis 
formation as a design problem. A program called IIvpGene designs hvpotheses bv reasoning 
backward from its goal of eliminating the difference between prediction and observation. 
I his prediction error is eliminated b\ design operators that are applied In a planning swem. 
The synthetic, goal-directed application of these operators should pro\e more efficient than 
past generate-and-tcst approaches to hypothesis generation. IIvpGene uses heuristic search 
to guide a generator that is focused on the errors in a prediction. The advantages of the 
design approach to hypothesis-formation over the generate-and-test approach are analogous 
to the advantages of dependency-directed backtracking over chronological backtracking. 
These hypothesis-formation methods w ere dev eloped in the context of a historical study of a 
scientific research program in molecular biology. This paper describes in detail the results of 
applying the IIvpGene program to several hv pothesis-lormation problems identified in the 
historical studv . 1 Iv pGene found most of the same solutions as did the biologists, which 
demonstrates that it is capable of solving complex, real-world hypothesis-formation 
problems. 

P. Karp and M. Riley, 'd<ej)r^ know ledge." in Proceedings of die First 

International Conference on Intelligent Systems for Molecular Biology (L. Hunter. D. Searls. and J. 
Shavlik. eds.). (Menlo Park. C'A ). pp. 207-2 I x AAAI Press. 1993. 

Abstract: Construction of electronic repositories of metabolic information is an 
increasingly active area of research. Kncoding detailed know ledge of a complex biological 
domain requires finely honed representations. W e survey representations used lor several 
metabolic databases, including EcoCyc, and reach the following conclusions. 
Representation of the metabolism must distinguish enzyme classes from individual 
enzymes, because there is not a one-to-one mapping from enzymes to the reactions they 
catalyze. Individual enzymes must be represented explicitly as proteins, e.g.. by encoding 
their subunit structure. The species v ariation of metabolism must be represented. So must 
the substrate specificity of enzymes, w hich may be treated in several w ays. 
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P. D. Karp. "A qualitative biochemistry and its application to the regulation of the tryptophan operon. 
in Artificial Intelligence and Molecular Biology (L I lunter. ed.). Menlo Park. CA: AAAI Press. 1 WT 
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8. no. 4. pp. 347-357. 1992. 

Abstract: This paper describes a publicly av ailable know ledge base of the chemical 
compounds involved in intermediary metabolism. We consider the motivations for 
constructing a knowledge base of metabolic compounds, the methodologv bv which it was 
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